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The Advanced LIGO and Advanced Virgo gravitational wave (GW) detectors will begin operation 
in the coming years, with compact binary coalescence events a likely source for the first detections. 

The gravitational waveforms emitted directly encode information about the sources, including the 
masses and spins of the compact objects. Recovering the physical parameters of the sources from the 
GW observations is a key analysis task. This work describes the LALInference software library for 
Bayesian parameter estimation of compact binary signals, which builds on several previous methods 
to provide a well-tested toolkit which has already been used for several studies. 

We show that our implementation is able to correctly recover the parameters of compact binary 
signals from simulated data from the advanced GW detectors. We demonstrate this with a detailed 
comparison on three compact binary systems: a binary neutron star (BNS), a neutron star - black 
hole binary (NSBH) and a binary black hole (BBH), where we show a cross-comparison of results 
obtained using three independent sampling algorithms. These systems were analysed with non- 
spinning, aligned spin and generic spin configurations respectively, showing that consistent results 
can be obtained even with the full 15-dimensional parameter space of the generic spin conhgurations. 

We also demonstrate statistically that the Bayesian credible intervals we recover correspond to 
frequentist confidence intervals under correct prior assumptions by analysing a set of 100 signals 
drawn from the prior. 

We discuss the computational cost of these algorithms, and describe the general and problem- 
specific sampling techniques we have used to improve the efficiency of sampling the compact binary 
coalescence (GBG) parameter space. 

PACS numbers: 02.50.Tt, 04.30.-w, 95.85.Sz 


I. INTRODUCTION 

The direct observation of GWs and the study of rela- 
tivistic sources in this new observational window is the 
focus of a growing effort with broad impact on astron- 
omy and fundamental physics. The network of GW laser 
interferometers - LIGO [I], Virgo [2] and GEO 600 [3] 
- completed science observations in initial conhguration 
in 2010, setting new upper-limits on a broad spectrum 
of GW sources. At present, LIGO and Virgo are being 


upgraded to their advanced conhgurations [4, 5], a new 
Japanese interferometer, KAGRA (formerly known as 
the Large-Scale Gravitational-wave Telescope, LGGT) [6] 
is being built, and plans are underway to relocate one of 
the LIGO instruments upgraded to Advanced LIGO sen- 
sitivity to a site in India (LIGO-India) [7]. Advanced 
LIGO is currently on track to resume science observa- 
tions in 2015 with Advanced Virgo following soon af- 
ter [8]; around the turn of the decade LIGO-India and 
KAGRA should also join the network of ground-based 
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instruments. 

Along with other possible sources, advanced ground- 
based interferometers are expected to detect GWs gen- 
erated during the last seconds to minutes of life of 
extra-galactic compact binary systems, with neutron 
star and/or black hole component masses in the range 
~ 1 Mq — 100 Mq. The current uncertainties on some 
of the key physical processes that affect binary forma- 
tion and evolution are reflected in the expected detection 
rate, which spans three orders of magnitude. However, 
by the time interferometers operate at design sensitivity, 
between one observation per few years and hundreds of 
observations per year are anticipated [8, 9], opening new 
avenues for studies of compact objects in highly relativis- 
tic conditions. 

During the approximately ten years of operation of 
the ground-based GW interferometer network, analysis 
development efforts for binary coalescences have been 
focused on the detection problem, and rightly so: how 
to unambiguously identify a binary coalescence in the 
otherwise overwhelming instrumental noise. The most 
sensitive compact binary searches are based on matched- 
filtering techniques, and are designed to keep up with 
the data rate and promptly identify detection candi- 
dates [10, 11]. A confirmation of the performance of 
detection pipelines has been provided by the “blind in- 
jection challenge” in which a synthetic compact binary 
coalescence signal was added (unknown to the analysis 
teams) to the data stream and successfully detected [12]. 

Once a detection candidate has been isolated, the next 
step of the analysis sequence is to extract full informa- 
tion regarding the source parameters and the underlying 
physics. With the expected detection of GWs in the com- 
ing years, this part of the analysis has become the focus 
of a growing number of studies. 

The conceptual approach to inference on the GW sig- 
nal is deeply rooted in the Bayesian framework. This 
framework makes it possible to evaluate the marginal- 
ized posterior probability density functions (PDFs) of the 
unknown parameters that describe a given model of the 
data and to compute the so-called evidence of the model 
itself. It is well known that Bayesian inference is compu- 
tationally costly, making the efficiency of the PDF and 
evidence calculations an important issue. For the case 
of coalescing binary systems the challenge comes from 
many fronts: the large number of unknown parameters 
that describe a model (15 parameters to describe a grav- 
itational waveform emitted by a binary consisting of two 
point masses in a circular orbit assuming that general 
relativity is accurate, plus other model parameters to ac- 
count for matter effects in the case of neutron stars, the 
noise, instrument calibration, etc.), complex multi-modal 
likelihood functions, and the computationally intensive 
process of generating waveforms. 

Well known stochastic sampling techniques -- Markov 
chain Monte Garlo [13-21], Nested Sampling [22, 23] and 
MultiNest/BAMBI [24-27] - have been used in recent 
years to develop algorithms for Bayesian inference on 


GW data aimed at studies of coalescing binaries. An 
underlying theme of this work has been the comparison 
of these sampling techniques and the cross-validation of 
results with independent algorithms. In parallel, the in- 
ference effort has benefited from a number of advances 
in other areas that are essential to maximise the science 
exploitation of detected GW signals, such as waveform 
generation and standardised algorithms and libraries for 
the access and manipulation of GW data. The initially 
independent developments have therefore progressively 
converged towards dedicated algorithms and a common 
infrastructure for Bayesian inference applied to GW ob- 
servations, specifically for coalescing binaries but appli- 
cable to other sources. These algorithms and infrastruc- 
ture are now contained in a dedicated software package: 
LALInf erence. 

The goal of this paper is to describe LALInf erence. 
We will cover the details of our implementation, designed 
to overcome the problems faced in performing Bayesian 
inference for GW observations of GBC signals. This 
includes three independent sampling techniques which 
were cross-compared to provide confidence in the re- 
sults that we obtain for GBC signals, and compared with 
known analytical probability distributions. We describe 
the post-processing steps involved in converting the out- 
put of these algorithms to meaningful physical state- 
ments about the source parameters in terms of credi- 
ble intervals. We demonstrate that these intervals are 
well-calibrated measures of probability through a Monte 
Carlo simulation, wherein we confirm the quoted prob- 
ability corresponds to frequency under correct prior as- 
sumptions. We compare the computational efficiency of 
the different methods and mention further enhancements 
that will be required to take full advantage of the ad- 
vanced GW detectors. 

The LALInf erence software consists of a C library and 
several post-processing tools written in python. It lever- 
ages the existing LSC Algorithm Library (LAL) to pro- 
vide 

• Standard methods of accessing GW detector data, 
using LAL methods for estimating the power spec- 
tral density (PSD), and able to simulate stationary 
Gaussian noise with a given noise curve. 

• the ability to use all the waveform approximants in- 
cluded in LAL that describe the evolution of point- 
mass binary systems, and waveforms under devel- 
opment to account for matter effects in the evolu- 
tion of binary neutron stars and generalisations of 
waveforms beyond general relativity; 

• Likelihood functions for the data observed by a net- 
work of ground-based laser interferometers given a 
waveform model and a set of model parameters; 

• Three independent stochastic sampling techniques 
of the parameter space to compute PDFs and evi- 
dence; 
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• Dedicated “jump proposals” to efficiently select 
samples in parameter space that take into account 
the specific structure of the likelihood function; 

• Standard post-processing tools to generate proba- 
bility credible regions for any set of parameters. 

During the several years of development, initial imple- 
mentations of these Bayesian inference algorithms and 
LALInf erence have been successfully applied to a variety 
of problems, such as the impact of different network con- 
figurations on parameter estimation [28], the ability to 
measure masses and spins of compact objects [17, 29, 30], 
to reconstruct the sky location of a detected GW bi- 
nary [19, 31, 32] and the equation of state of neutron 
stars [33], the effects of calibration errors on informa- 
tion extraction [34] and tests of general relativity [35- 
37]. Most notably LALInf erence has been at the heart 
of the study of detection candidates, including the blind 
injection, during the last LIGO/ Virgo science run [38], 
and has been used for the Numerical INJection Analysis 
project NINJA2 [39]. It has been designed to be flexible 
in the choice of signal model, allowing it to be adapted for 
analysis of signals other than compact binaries, includ- 
ing searches for continuous waves [40], and comparison 
of core-collapse supernova models based on [41]. 

The paper is organised as follows: Section II pro- 
vides a summary of the key concepts of Bayesian in- 
ference, and specific discussion about the many wave- 
form models that can be used in the analysis and 
the relevant prior assumptions. In Section III we de- 
scribe the conceptual elements concerning the general 
features of the sampling techniques that are part of 
LALInf erence: Markov chain Monte Garlo, Nested Sam- 
pling and MultiNest/BAMBI. Section IV deals with 
the problem of providing marginalized probability func- 
tions and (minimum) credible regions at a given confi- 
dence level from a finite number of samples, as is the 
case of the outputs of these algorithms. In Section V we 
summarise the results from extensive tests and valida- 
tions that we have carried out by presenting representa- 
tive results on a set of injections in typical regions of the 
parameter space, as well as results obtained by running 
the algorithms on known distributions. This section is 
complemented by Section VI in which we consider effi- 
ciency issues, and we report the run-time necessary for 
the analysis of coalescing binaries in different cases; this 
provides a direct measure of the latency timescale over 
which fully coherent Bayesian inference results for all the 
source parameters will be available after a detection can- 
didate is identified. Section VII contains our conclusions 
and pointers to future work. 


II. BAYESIAN ANALYSIS 

We can divide the task of performing inference about 
the GW source into two problems: using the observed 
data to constrain or estimate the unknown parameters 


of the source ^ under a fixed model of the waveform (pa- 
rameter estimation), and deciding which of several mod- 
els is more probable in light of the observed data, and by 
how much (model selection). We tackle both these prob- 
lems within the formalism of Bayesian inference, which 
describes the state of knowledge about an uncertain hy- 
pothesis H as a probability, denoted P{H) £ [0,1], or 
about an unknown parameter as a probability density, 
denoted p{9\H), where / p{9\H)d6 = 1. Parameter es- 
timation can then be performed using Bayes’ theorem, 
where a prior probability distribution p{9\H) is updated 
upon receiving the new data d from the experiment to 
give a posterior distribution p{9\d,H), 


p{9\d, H) 


p{9\H)p{d\9,H) 

p{d\H) 


( 1 ) 


Models typically have many parameters, which we col- 
lectively indicate with 9 = {9\,92t ■ ■ ,9^} ■ The joint 
probability distribution on the multi-dimensional space 
p{6\d,H) describes the collective knowledge about all 
parameters as well as their relationships. Results for a 
specific parameter are found by marginalising over the 
unwanted parameters. 


p{9i\d,H)= J d92 ■ ■ .d9Np{9\d,H) . (2) 


The probability distribution can be used to find the ex- 
pectation of various functions given the distribution, e.g. 
the mean 

{Oi) = J 9ip{9i\d, H)d9i , (3) 

and credible regions, an interval in parameter space that 
containing a given probability (see Section IV). 

Model selection is performed by comparing the fully 
marginalized likelihood, or ‘evidence’, for different mod- 
els. The evidence, usually denoted Z, is simply the inte- 
gral of the likelihood, L{d\9) = p{d\9,H), multiplied by 
the prior over all parameters of the model H, 


Z = p{d\H) = J d9i . . ,d9N p{d\9,H)p{9\H). (4) 


This is the normalisation constant that appears in the 
denominator of Eq. (1) for a particular model. Because 
we cannot exhaustively enumerate the set of exclusive 
models describing the data, we typically compare two 
competing models. To do this, one computes the ratio of 
posterior probabilities 


, pmd) pm z, 

P{HM) P{Hj) z, 


(5) 


^ The whole set of unknown parameters of the model can also 
contain parameters not related to the source, such as noise and 
calibration parameters [42-45]. 


4 


where Bij = ZijZj is the ‘Bayes Factor’ between the two 
competing models i and j, which shows how much more 
likely the observed data d is under model i rather than 
model j. 

While the Bayesian methods described above are con- 
ceptually simple, the practical details of performing an 
analysis depend greatly on the complexity and dimen- 
sionality of the model, and the amount of data that 
is analysed. The size of the parameter space and the 
amount of data to be considered mean that the result- 
ing probability distribution cannot tractably be analysed 
through a fixed sampling of the parameter space. In- 
stead, we have developed methods for stochastically sam- 
pling the parameter space to solve the problems of pa- 
rameter estimation and model selection, based on the 
Markov chain Monte Carlo (MCMC) and Nested Sam- 
pling techniques, the details of which are described in 
section III. Next we will describe the models used for the 
noise and the signal. 


A. Data model 

The data obtained from the detector is modelled as 
the sum of the compact binary coalescence signal h (de- 
scribed in section II B) and a noise component n. 


d — h + n. (6) 

Data from multiple detectors in the network are anal- 
ysed coherently, by calculating the strain that would be 
observed in each detector: 

h = F+{a,S,tlj)h+ + F^{a,S,i;)h^ (7) 

where /i+.x are the two independent GW polarisation 
amplitudes and F^^y_{a,5,tl)) are the antenna response 
functions ([e.g. 46]) that depend on the source location 
and the polarisation of the waves. Presently we ignore 
the time dependence of the antenna response function 
due to the rotation of the Earth, instead assuming that 
it is constant throughout the observation period. This is 
justifiable for the short signals considered here. Work is 
ongoing to include this time dependence when analysing 
very long signals with a low frequency cutoff below 40 Hz, 
to fully exploit the advanced detector design sensitiv- 
ity curves. The waveforms h+ ^ are described in Sec- 
tion II B. 

As well as the signal model, which is discussed in the 
next section, we must include a description of the ob- 
served data, including the noise, which is used to create 
the likelihood function. This is where knowledge of the 
detectors’ sensitivity and the data processing procedures 
are folded into the analysis. 

We perform all of our analyses using the calibrated 
strain output of the GW detectors, or a simulation 
thereof. This is a set of time-domain samples di sam- 
pled uniformly at times ti, which we sometimes write as 
a vector d for convenience below. To reduce the volume 


of data, we down-sample the data from its original sam- 
pling frequency (16384 Hz) to a lower rate fs > 2/max, 
which is high enough to contain the maximum frequency 
/max of the lowest mass signal allowed by the prior, typ- 
ically fs = 4096 Hz when analysing the inspiral part of a 
BNS signal. To prevent aliasing the data is first low-pass 
filtered with a 20th order Butterworth filter with atten- 
uation of 0.1 at the new Nyquist frequency, using the 
implementation in LAL [47] , which preserves the phase of 
the input. We wish to create a model of the data that 
can be used to perform the analysis. In the absence of 
a signal, the simplest model which we consider is that of 
Gaussian, stationary noise with a certain power spectral 
density S'„(/) and zero mean. 5'„(/) can be estimated 
using the data adjacent to the segment of interest, which 
is normally selected based on the time of coalescence tc 
of a candidate signal identified by a search pipeline. The 
analysis segment d spans the period [tc — T -|-2, tc + 2], i.e. 
a time T which ends two seconds after the trigger (the 
2 s safety margin after tc allows for inaccuracies in the 
trigger time reported by the search, and should encom- 
pass any merger and ringdown component of the signal) . 
To obtain this estimate, by default we select a period of 
time (1024 s normally, but shorter if less science data is 
available) from before the time of the trigger to be anal- 
ysed, but ending not later than tc — T, so it should not 
contain the signal of interest. This period is divided into 
non-overlapping segments of the same duration T as the 
analysis segment, which are then used to estimate the 
PSD. Each segment is windowed using a Tukey window 
with a 0.4 s roll-off, and its one-sided noise power spec- 
trum is computed. For each frequency bin the median 
power over all segments is used as an estimate of the 
PSD in that bin. We follow the technique of [48] by us- 
ing the median instead of the mean to provide some level 
of robustness against large outliers occurring during the 
estimation time. 

The same procedure for the PSD estimation segments 
is applied to the analysed data segment before it is used 
for inference, to ensure consistency. 

For each detector we assume the noise is stationary, 
and characterised only by having zero mean and a known 
variance (estimated from the power spectrum). Then 
the likelihood function for the noise model is simply the 
product of Gaussian distributions in each frequency bin 


p{d\HN,Snif)) = exp^ 

i 


TSn{h) 


^ log(7^TSM^)/2) 
(8) 


where d is the discrete Fourier transform of d 

dj = dk exp(-27rijfe/A). (9) 

k 


The presence of an additive signal h in the data simply 
adjusts the mean value of the distribution, so that the 


5 


likelihood including the signal is 


p{d\Hs,Sn{f),9) = exp^ 

i 


2|h,(0)- 

TS^{U) 


-\\og{^TSMi)m • 


(10) 


To analyse a network of detectors coherently, we make the 
further assumption that the noise is uncorrelated in each. 
This allows us to write the coherent network likelihood 
for data obtained from each detector as the product of 
the likelihoods in each detector [49]. 

P{d{H.Ly}\HsTSnl^H,L,V}{f)) = n 

( 11 ) 

This gives us the default likelihood function which is used 
for our analyses, and has been used extensively in previ- 
ous work. 


1. Marginalising over uncertainty in the PSD estimation 

Using a fixed estimate of the PSD, taken from times 
outside the segment being analysed, cannot account 
for slow variations in the shape of the spectrum over 
timescales of minutes. We can model our uncertainty in 
the PSD estimate by introducing extra parameters into 
the noise model which can be estimated along with the 
signal parameters; we follow the procedure described in 
[43]. We divide the Fourier domain data into ^ 8 log- 
arithmically spaced segments, and in each segment j, 
spanning Nj frequency bins, introduce a scale parame- 
ter r]j{fi) which modifies the PSD such that S'„(/i) — t 
Sn{fi)Vj lor ij < i < tj+ij where the scale parameter 
is constant within a frequency segment. With these ad- 
ditional degrees of freedom included in our model, the 
likelihood becomes 


p{d\Hs,Sn{f),9,v) = exp^ 

i 


2\h,{e) - 
Tp{fi)Sr,{h) 


--log(7r77,T5'„(/i)/2) 


(12) 


The prior on pj is a normal distribution with mean 1 
and variance 1/A(, . In the limit Nj — >■ 1 (i.e., there is 
one scale parameter for each Fourier bin), replacing the 
Gaussian prior with an inverse chi-squared distribution 
and integrating p{d\Hs,Sn{f),0,'q) x p(0, rjJiJs, 5„(/)) 
over rj, we would recover the Student’s t-distribution like- 
lihood considered for GW data analysis in [42, 50]. For a 
thorough discussion of the relative merits of Student’s t- 
distribution likelihood and the approach used here, as 
well as examples which show how including r] in the 
model improves the robustness of parameter estimation 
and model selection results, see [43]. In summary, the 


likelihood adopted here offers more flexibility given how 
much the noise can drift between the data used for esti- 
mating the PSD and the data being analysed. Further 
improvements on this scheme using more sophisticated 
noise models are under active development. 


B. Waveform models 


There are a number of different models for the GW 
signal that is expected to be emitted during a compact- 
binary merger. These models, known as waveform 
families, differ in their computational complexity, the 
physics they simulate, and their regime of applicability. 
LALInference has been designed to easily interface with 
arbitrary waveform families. 

Each waveform family can be thought of as a func- 
tion that takes as input a parameter vector 6 and pro- 
duces as output (^)i either a time domain h{6] t) or 
frequency-domain h{6] f) signal. The parameter vector 
0 generally includes at least nine parameters: 

• Component masses m\ and m 2 - We use a 
reparametrisation of the mass plane into the chirp 
mass, 

M = (mim2)^^®(mi 4 - 7712 )“^^® (13) 

and the asymmetric mass ratio 

q = 7712 / 7711 , (14) 


as these variables tend to be less correlated and 
easier to sample. We use the convention m\ > m 2 
when labelling the components. The prior is trans- 
formed accordingly (see figure I). Another possible 
parametrisation is the symmetric mass ratio 


7? = 


(TO 1 TO 2 ) 
(mi -I- m2Y 


(15) 


although we do not use this when sampling the 
distribution since the Jacobian of the transforma- 
tion to mi, m 2 coordinates becomes singular at 
mi = m 2 . 


• The luminosity distance to the source 

• The right ascension a and declination 5 of the 
source; 

• The inclination angle between the system’s or- 
bital angular momentum and the line of sight. For 
aligned- and non-spinning systems this coincides 
with the angle 0 jat between the total angular mo- 
mentum and the line of sight (see below). We will 
use the more general Ojjq throughout the text. 

• The polarisation angle ip which describes the ori- 
entation of the projection of the binary’s orbital 
momentum vector onto the plane on the sky, as 
defined in [46]; 
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• An arbitrary reference time tc, e.g. the time of 
coalescence of the binary; 

• The orbital phase </>c of the binary at the reference 
time tc- 

Nine parameters are necessary to describe a circular bi- 
nary consisting of point-mass objects with no spins. If 
spins of the binary’s components are included in the 
model, they are described by six additional parameters, 
for a total of 15: 

• dimensionless spin magnitudes a^, defined as Ui = 
\si\/mf and in the range [0, 1], where Sj is the spin 
vector of the object i, and 

• two angles for each s* specifying its orientation with 
respect to the plane defined by the line of sight and 
the initial orbital angular momentum. 

In the special case when spin vectors are assumed to be 
aligned or anti-aligned with the orbital angular momen- 
tum, the four spin-orientation angles are fixed, and the 
spin magnitudes alone are used, with positive (negative) 
signs corresponding to aligned (anti-aligned) configura- 
tions, for a total of 11 parameters. In the case of pro- 
cessing waveforms, the system-frame parametrisation has 
been found to be more efficient than the radiation frame 
typically employed for parameter estimation of processing 
binaries. The orientation of the system and its spinning 
components are parameterised in a more physically in- 
tuitive way that concisely describes the relevant physics, 
and defines evolving quantities at a reference frequency 
of 100 Hz, near the peak sensitivity of the detectors [51]: 

• OjM'- The inclination of the system’s total angular 
momentum with respect to the line of sight; 


of the whole inspiral-merger-ringdown are important in 
choosing the appropriate length of the data segment 
to analyse and the bandwidth necessary to capture the 
whole radiation. At the leading Newtonian quadrupole 
order, the time to coalescence of a binary emitting GWs 
at frequency / is [48]: 


T = 93.9 


/ 

30 Hz 


- 8/3 


M 

0.87 Mq 


- 5/3 


sec . 


(16) 


Here we have normalised the quantities to an mi = m 2 = 
1 Mq equal mass binary. The frequency of dominant 
mode gravitational wave emission at the innermost stable 
circular orbit for a binary with non-spinning components 
is [48]: 


/isco — 


6^/^7r(mi -I- m 2 ) 


= 4.4 


Mq 


0 


mi -I- m 2 


kHz, (17) 


The low-frequency cut-off of the instrument, which sets 
the duration of the signal, was 40 Hz for LIGO in ini- 
tial/enhanced configuration and 30 Hz for Virgo. When 
the instruments operate in advanced configuration, new 
suspension systems are expected to provide increased 
low-frequency sensitivity and the low-frequency bound 
will progressively move towards ~ 20 Hz. The quanti- 
ties above define therefore the longest signals that one 
needs to consider and the highest frequency cut-off. The 
data analysed (the ‘analysed segment’) must include the 
entire length of the waveform from the desired starting 
frequency. 

Although any waveform model that is included in the 
LAL libraries can be readily used in LALInf erence, the 
most common waveform models used in our previous 
studies [e.g., 55] are: 


• ti,t 2 '- Tilt angles between the compact objects’ 
spins and the orbital angular momentum; 

• 4^12 ■ The complimentary azimuthal angle separat- 
ing the spin vectors; 

• The azimuthal position of the orbital angular 
momentum on its cone of precession about the total 
angular momentum. 

Additional parameters are necessary to fully describe 
matter effects in systems involving a neutron star, namely 
the equation of state [52] , or to model deviations from the 
post-Newtonian expansion of the waveforms [e.g. 36, 53], 
but we do not consider these here. Finally, additional 
parameters could be used to describe waveforms from ec- 
centric binaries [54] but these have not yet been included 
in our models. 

GWs emitted over the whole coalescence of two com- 
pact objects produce a characteristic “chirp” of increas- 
ing amplitude and frequency during the adiabatic inspi- 
ral phase, followed by a broad-band merger phase and 
then damped quasi-sinusoidal signals during the ring- 
down phase. The characteristic time and frequency scales 


• Frequency-domain stationary phase inspiral-only 
post-Newtonian waveforms for binaries with non- 
spinning components, particularly the TaylorF2 ap- 
proximant [56]; 

• Time-domain inspiral-only post-Newtonian wave- 
forms that allow for components with arbitrary, 
precessing spins, particularly the SpinTaylorT4 ap- 
proximant [57]; 

• Frequency-domain inspiral-merger-ringdown phe- 
nomenological waveform model calibrated to nu- 
merical relativity, IMRPhenomB, which describes 
systems with (anti)aligned spins [58]; 

• Time-domain inspiral-merger-ringdown effective- 
one-body model calibrated to numerical relativity, 
EOBNRv2 [59]. 

Many of these waveform models have additional options, 
such as varying the post-Newtonian order of amplitude 
or phase terms. Furthermore, when exploring the param- 
eter space with waveforms that allow for spins, we some- 
times find it useful to set one or both component spins to 
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zero, or limit the degrees of freedom by only considering 
spins aligned with the orbital angular momentum. 

We generally carry out likelihood computations in the 
frequency domain, so time-domain waveforms must be 
converted into the frequency domain by the discrete 
Fourier transform defined as in eq. (9). To avoid edge ef- 
fects and ensure that the templates and data are treated 
identically (see Section II A), we align the end of the time- 
domain waveform to the discrete time sample which is 
closest to tc and then taper it in the same way as the 
data (if the waveform is non-zero in the first or last 0.4s of 
the buffer) , before Fourier-transforming to the frequency 
domain and applying any finer time-shifting in the fre- 
quency domain, as described below. 


1. Analytic marginalisation over phase 

The overall phase (pc of the GW is typically of no as- 
trophysical interest, but is necessary to fully describe the 
signal. When the signal model includes only the funda- 
mental mode (Z = m = 2) of the GW it is possible to 
analytically marginalize over pc, simplifying the task of 
the inference algorithms in two ways. Firstly, the elimi- 
nation of one dimension makes the parameter space easier 
to explore; secondly the marginalized likelihood function 
over the remaining parameters has a lower dynamic range 
than the original likelihood. The desired likelihood func- 
tion over the remaining parameters is calculated by 
marginalising Eq. (10), 


Some of the parameters, which we call intrinsic param- 
eters (masses and spins), influence the evolution of the 
binary. Evaluating a waveform at new values of these pa- 
rameters generally requires recomputing the waveform, 
which, depending on the model, may involve purely ana- 
lytical calculations or a solution to a system of differen- 
tial equations. On the other hand, extrinsic parameters 
(sky location, distance, time and phase) leave the basic 
waveform unchanged, while only changing the detector 
response functions F+ and and shifting the relative 
phase of the signal as observed in the detectors. This al- 
lows us to save computational costs in a situation where 
we have already computed the waveform and are now in- 
terested in its re-projection and/or phase or time shift; in 
particular, this allows us to compute the waveform only 
once for an entire detector network, and merely change 
the projection of the waveform onto detectors. We typi- 
cally do this in the frequency domain. 

The dependence of the waveform on distance (scaling 
as I/di), sky location and polarisation (detector response 
described by antenna pattern functions F+^^{a,5^'p) 
for the -I- and x polarisations, see eq. (7)) and phase 
iji{pc) = is straightforward. A time 

shift by At corresponds to a multiplication h{At) = 
(i(O)e^'^®-^^* in the frequency domain; this time shift will 
be different for each detector, since the arrival time of a 
GW at the detector depends on the location of the source 
on the sky and the location of the detector on Earth. 

The choice of parameterization greatly influences the 
efficiency of posterior sampling. The most efficient pa- 
rameterizations minimize the correlations between pa- 
rameters and the number of isolated modes of the poste- 
rior. For the mass parameterization, the chirp mass M 
and asymmetric mass ratio q achieve this, while avoid- 
ing the divergence of the Jacobian of the symmetric mass 
ratio rj at equal masses when using a prior flat in com- 
ponent masses. With generically oriented spins comes 
precession, and the evolution of angular momentum ori- 
entations. In this case the structure of the posterior is 
simplified by specifying these parameters, chosen so that 
they evolve as little as possible, at a reference frequency 
of 100 Hz near the peak sensitivity of the detector [51]. 


p{d\Hs, 5„(/), ^^) = j p{Pc\Hs)p{d\e, Hs, SMWc 

(18) 

where p{pc\Hs) = 1/2tt is the uniform prior on phase. 

Starting from Eq. 11 we can write the likelihood for 
multiple detectors indexed j as 


(19) 

where Hq is the signal defined at a reference phase of 0. 
Using this definition, the integral of Eq. (18) can be cast 
into a standard form to yield 


p{dj\Hs,Snj{f),^) = 


exp 

2 + Mijp 

Iq 

4 

\ ^ ho^jd^j 



T 










in terms of the modihed Bessel function of the first kind 
Iq. Note that the marginalised likelihood is no longer ex- 
pressible as the product of likelihoods in each detector. 
We found that using the marginalized phase likelihood 
could reduce the computation time of a nested sampling 
analysis by a factor of up to 4, as the shape of the distri- 
bution was easier to sample, reducing the autocorrelation 
time of the chains. 


p{dj\Hs,Snj{f),0) ocexp 


T 


Sn,{h) 


X exp 




Sum 


C. Priors 

As shown in Eq. (1), the posterior distribution of d (or 
Q) depends both on the likelihood and prior distributions 
of 0. LALInference allows for flexibility in the choice 
of priors. For all analyses described here, we used the 
same prior density functions (and range). For component 
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FIG. 1; Prior probability p{mi,m 2 \Hs), uniform in component masses within the bounds shown (left), and the same distri- 
bution transformed into the A4,q parametrization used for sampling. 


masses, we used uniform priors in the component masses 
with the range 1 Mq < toi_2 < 30 Mq, and with the total 
mass constrained by mi -|-m 2 < 35 Mq, as shown in Fig. 
1. This range encompasses the low-mass search range 
used in [12] and our previous parameter estimation report 
[55], where 1 Mq < mi _2 < 24 Mq and mi -I- m 2 < 25 Mq. 
When expressed in the sampling variable A4 , q the prior 
is determined by the Jacobian of the transformation, 

p{M,q\I) (X Mm^'^ (21) 

which is shown in the right panel of figure 1. 

The prior density function on the location of the source 
was taken to be isotropically distributed on the sphere 
of the sky, with p{dL\Hs) oc cLl^ , from 1 Mpc out to 
a maximum distance chosen according to the detector 
configuration and the source type of interest. We used 
an isotropic prior on the orientation of the binary to 
give p{b,ip,4>c\Hs) oc sint. For analyses using waveform 
models that account for possible spins, the prior on the 
spin magnitudes, 01 , 02 , was taken to be uniform in the 
range [0, 1] (range [—1, 1] in the spin-aligned cases), and 
the spin angular momentum vectors were taken to be 
isotropic. 

The computational cost of the parameter estimation 
pipeline precludes us from running it on all data; there- 
fore, the parameter estimation analysis relies on an esti- 
mate of the coalescence time as provided by the detection 
pipeline [12]. In practice, a 200 ms window centered on 
the trigger time is sufficient to guard against the uncer- 
tainty and bias in the coalescence time estimates from 
the detection pipeline, see for instance [10, 60]. For the 
signal-to-noise ratios (SNRs) used in this paper, our pos- 
teriors are much narrower than our priors for most pa- 
rameters. 


III. ALGORITHMS 
A. MCMC 

Markov chain Monte Carlo methods are designed to 
estimate a posterior by stochastically wandering through 
the parameter space, distributing samples proportionally 
to the density of the target posterior distribution. Our 
MCMC implementation uses the Metropolis-Hastings al- 
gorithm [61, 62], which requires a proposal density func- 
tion Q{9'\6) to generate a new sample 0', which can only 
depend on the current sample 6. Such a proposal is ac- 
cepted with a probability = min(l, a), where 

^ Q{e\e')p{e’\d,H) 

Q{e'\e)p{e\d,H) ' ^ ^ 

If accepted, 9' is added to the chain, otherwise 9 is re- 
peated. 

Chains are typically started at a random location in pa- 
rameter space, requiring some number of iterations before 
dependence on this location is lost. Samples from this 
hurn-in period are not guaranteed to be draws from the 
posterior, and are discarded when estimating the pos- 
terior. Furthermore, adjacent samples in the chain are 
typically correlated, which is undesirable as we perform 
Kolmogorov-Smirnov tests of the sampled distributions, 
which requires independent samples. To remove this cor- 
relation we thin each chain by its integrated autocorre- 
lation time (ACT) r, defined defined as 

r=l-f2^c(t), (23) 

t 

where t labels iterations of the chain and c{t) is the Pear- 
son correlation coefficient between the chain of samples 
and itself shifted by t samples [63] The chain is thinned 
by using only every r-th sample, and the samples re- 
maining after burn-in and ACT thinning are referred to 
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as the effective samples. This is necessary for some post- 
processing checks which assume that the samples are sta- 
tistically independent. 

The efficiency of the Metropolis-Hastings algorithm is 
largely dependent on the choice of proposal density, since 
that is what governs the acceptance rates and ACTs. The 
standard, generically applicable distribution is a Gaus- 
sian centered on 6, the width of which will affect the 
acceptance rate of the proposal. Large widths relative 
to the scale of the target posterior distribution will lead 
to low acceptance rates with many repeated samples, 
whereas small widths will have high acceptance rates 
with highly correlated samples, both resulting in large 
ACTs. For a simplified setting of a unimodal Gaussian 
posterior, the optimal acceptance rate can be shown to 
be 0.234 [64]. Though our posterior can be more compli- 
cated, we find that targeting this acceptance rate gives 
good performance and consistent ACTs for all posteri- 
ors that we have considered. Therefore, during the first 
100,000 samples of a run, we adjust the ID Gaussian 
proposal widths to achieve this acceptance rate. This 
period of adjustment is re-entered whenever the sampler 
finds a log likelihood (log L) that is N/2 larger than has 
been seen before in a run, under the assumption that 
this increase in likelihood may indicate that a new area 
of parameter space is being explored. 

When the posterior deviates from a unimodal 
Gaussian-like distribution, using only the local Gaussian 
proposal becomes very inefficient. The posteriors encoun- 
tered in GW data analysis typically consists of multiple 
isolated modes, separated by regions of lower probabil- 
ity. To properly weigh these modes, a Markov chain must 
jump between them frequently, which is a very unlikely 
process when using only a local Gaussian proposal. In 
section IIIC we describe the range of jump proposals 
more adept at sampling the parameter space of a com- 
pact binary inspiral. We also describe the technique of 
parallel tempering, which we employ to ensure proper 
mixing of samples between the modes. 


1. Parallel Tempering 

Tempering [65, 66] introduces an inverse “tempera- 
ture” 1/T to the standard likelihood function, resulting 
in a modified posterior 


PT{e\d) ex p{e\H)L{e)T . (24) 

Increasing temperatures above T = 1 reduces the con- 
trast of the likelihood surface, broadening peaks, with 
the posterior approaching the prior in the high tempera- 
ture limit. Parallel tempering exploits this “flattening” of 
the posterior with increasing temperature by construct- 
ing an ensemble of tempered chains with temperatures 
spanning T = 1 to some finite maximum temperature 
2niax- Chains at higher temperatures sample a distri- 
bution closer to the prior, and are more likely to ex- 


plore parameter space and move between isolated modes. 
Regions of high posterior support found by the high- 
temperature chains are then passed down through the 
temperature ensemble by periodically proposing swaps 
in the locations of adjacent chains. Such swaps are ac- 
cepted at a rate Vs = min(l,u;y), where 


UJij 



(25) 


with Ti < Tj. 

For non-trivial posteriors this technique greatly in- 
creases the sampling efficiency of the T = 1 chain, but 
does so at a cost. In our implementation, samples with 
T > 1 are not used in construction of the final poste- 
rior distribution, but they are kept for calculation of ev- 
idence integrals via thermodynamic integration in post- 
processing IV C. 

All samples from chains with T > 1 are ultimately dis- 
carded, as they are not drawn from the target posterior. 
From a computational perspective however, each chain 
can run in parallel and not affect the total run time of the 
analysis. The MGMG implementation of LALInf erence, 
LALInf erenceMCMC, uses the Message Passing Interface 
(MPI) [67] to achieve this parallelization. In our calcu- 
lations, the temperatures Ti are distributed logarithmi- 
cally. Ghains are not forced to be in sync, and each chain 
proposes a swap in location with the chain above it (if 
one exists) every 100 samples. 


B. Nested Sampling 

Nested sampling is a Monte Garlo technique intro- 
duced by Skilling [22] for the computation of the Bayesian 
evidence that will also provide samples from the pos- 
terior distribution. This is done by transforming the 
multi-dimensional integral of Equation (4) into a one- 
dimensional integral over the prior volume. The prior 
volume is defined as X such that dV = d6p{0\H). There- 
fore, 

A(A) = [ d6p{9\H). (26) 

Jpid\0,H)>X 

This integral computes the total probability volume con- 
tained within a likelihood contour defined by p{d\6, H) = 
A. With this in hand. Equation (4) can now be written 
as 

Z = [ L{X)dX, (27) 

Jo 

where L{X) is the inverse of Equation (26) and is a mono- 
tonically decreasing function of X (larger prior volume 
enclosed implies lower likelihood value). By evaluating 
the likelihoods Li = L{Xi) associated with a monotoni- 
cally decreasing sequence of prior volumes A^, 

0 < Am < . . . < A2 < Ai < Ao = 1, (28) 
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FIG. 2: The profile of the likelihood function for each of the 
injections in Table II, mapped onto the fractional prior sup- 
port parameter X (see Eq. (28)). The algorithm proceeds 
from left (sampling entire prior) to right (sampling a tiny re- 
stricted part of the prior). The values of log(L) are normalised 
to the likelihood of the noise model. 


the evidence can be easily approximated with the trapez- 
ium rule, 


1 

Z = (29) 

Examples of the function L{X) for CBC sources are 
shown in figure 2. 

Applying this technique follows a fundamental set of 
steps. First, a set of initial ‘live’ points are sampled 
from the entire prior distribution. The point with the 
lowest likelihood value is then removed and replaced 
by a new sample with higher likelihood. This removal 
and replacement is repeated until a stopping condition 
has been reached. By default, the loop continues while 
LmaxXi/Zi > e°'^, where Lmax is the maximum like- 
lihood so far discovered by the sampler, Zi is the cur- 
rent estimate of the total evidence, and Xi is the frac- 
tion of the prior volume inside the current contour line. 
In short, this is checking whether the evidence estimate 
would change by more than a factor of ^ 0.1 if all the re- 
maining prior support were at the maximum likelihood. 
Posterior samples can then be produced by re-sampling 
the chain of removed points and current live points ac- 
cording to their posterior probabilities: 

p{e\d.H) = i^XzlXX}Xi^ ( 30 ) 

The estimation of the prior volume and method for effi- 
ciently generating new samples varies between implemen- 
tations. In LALInference we have included two such 
implementations, one based on an MCMC sampling of 
the constrained prior distribution, and the other on the 


MultiNest method, with extensions. These are de- 
scribed in the following two sections IIIB 1 and IIIB2. 


1. LALInferenceNest 


The primary challenge in implementing the nested 
sampling algorithm is finding an efficient means of draw- 
ing samples from the limited prior distribution 


p'{9\Hs) oc 


p{0\Hs) 

0 


L{d\6) > Lmin 
otherwise 


(31) 


In LALInference we build on the previous inspnest 
implementation described in [23], with several enhance- 
ments described here. This uses a short MCMC chain 
(see section III A) to generate each new live point, which 
is started from a randomly-selected existing live point. 

We use proposals of the same form as described in III C 
with slight differences: the differential evolution proposal 
is able to use the current set of live points as a basis for 
drawing a random difference vector, and for empirically 
estimating the correlation matrix used in the eigenvec- 
tor proposal. This ensures that the scale of these jumps 
adapts automatically to the current concentration of the 
remaining live points. In contrast to Eq. (22), the tar- 
get distribution that we are sampling is the limited prior 
distribution p' of Eq. (31), so the acceptance ratio is 

^ Q{e\e')p'{G'\H) 

Q{e'\e)p'{G\H) ■ ^ ^ 

Furthermore, we have introduced additional features 
which help to reduce the amount of manual tuning re- 
quired to produce a reliable result. 

a. Autocorrelation adaptation In [23] it was shown 
that the numerical error on the evidence integral was 
dependent not only on the number of live points Mive 
and the information content of the data (as suggested 
by Skilling), but also on the length of the MCMC sub- 
chains A^mcmc used to produce new samples (this is not 
included in the idealised description of nested sampling, 
since other methods of drawing independent new samples 
are also possible, see section IIIB 2). In inspnest, the 
user would specify this number at the start of the run, 
depending on their desire for speed or accuracy. The 
value then remained constant throughout the run. This 
is inefficient, as the difficulty of generating a new sample 
varies with the structure of the p'{G\Hs) distribution at 
different values of Amin- For example, there may be many 
secondary peaks which are present up to a certain value 
of Amin, but disappear above that, making the distribu- 
tion easier to sample. To avoid this inefficiency (and to 
reduce the number of tuning parameters of the code), we 
now internally estimate the required length of the sub- 
chains as the run progresses. To achieve this, we use the 
estimate of the autocorrelation timescale (defined as in 
Eq. 23) for parameter I of a sub-chain generated from a 
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FIG. 3: Length of MCMC sub-chain for nested sampling 
analysis of the BNS system (as in Table II) as a function of 
prior scale. As the run progresses, the length of the MCMC 
sub-chain used to generate the next live point automatically 
adapts to the current conditions, allowing it to use fewer it- 
erations where possible. The chain is limited to a maximum 
of 5000 iterations. 


randomly selected live point. The sum is computed up to 
the lag Mi which is the first time the correlation drops 
below 0.01, i.e Ci{Mi) < 0.01. The timescale is com- 
puted for each parameter being varied in the model, and 
the longest autocorrelation time is used as the number 
of MCMC iterations (M = max(Mi, . . . ,Mi) for subse- 
quent sub-chains until it is further updated after A^ijve/4 
iterations of the nested sampler. As the chain needed to 
compute the autocorrelation timescale is longer than the 
timescale itself, the independent samples produced are 
cached for later use. We note that as the nested sam- 
pling algorithm uses many live points, the correlation 
between subsequent points used for evaluating the evi- 
dence integral will be further diluted, so this procedure 
is a conservative estimate of the necessary chain thinning. 
The adaptation of the sub-chain length is shown in fig- 
ure 3, where the algorithm adapts to use < 1000 MCMC 
steps during the majority of the analysis, but can adjust 
its chain length to a limit of 5000 samples for the most 
difficult parts of the problem. 

b. Sloppy sampling For the analysis of CBC data, 
the computational cost of a likelihood evaluation com- 
pletely dominates that of a prior calculation, since it re- 
quires the generation of a trial waveform and the calcu- 
lation of an inner product (with possible FFT into the 
frequency domain) . The task of sampling the likelihood- 
limited prior p'{6\H) is performed by sampling from the 
prior distribution, rejecting any points that fall beneath 
the minimum threshold Tmin- During the early stages of 
the run, the Amin likelihood bound encloses a large vol- 
ume of the parameter space, which may take many itera- 
tions of the sub-chain to cross, and a proposed step orig- 
inating inside the bound is unlikely to be rejected by this 



FIG. 4: Acceptance ratio and fraction of sloppy jumps for 
nested sampling analysis of a BNS system. The dashed blue 
line shows the automatically determined fraction of propos- 
als for which the likelihood calculation is skipped. The solid 
green line shows the overall acceptance rate for new live 
points, which thanks to the adaptive jumps remains at a 
healthy level despite the volume of the sampled distribution 
changing by 17 orders of magnitude throughout the run. 


cut. We are free to make a shortcut by not checking the 
likelihood bound at each step of the sub-chain, allowing 
it to continue for ME iterations, where E is the fraction 
of iterations where the likelihood check is skipped. Since 
the calculation of the prior is essentially free compared 
to that of the likelihood, the computational efficiency is 
improved by a factor of {1 — E)~^. The likelihood bound 
is always checked before the sample is finally accepted as 
a new live point. 

Since the optimal value of E is unknown, and will vary 
throughout the run as the Amin contour shrinks the sup- 
port for the p'{0\H) distribution, we adaptively adjust 
it based on a target for the acceptance of proposals at 
the likelihood-cut stage. Setting a target acceptance rate 
of 0.3 at the likelihood cut stage, and having measured 
acceptance rate a, we adjust E in increments of 5% up- 
ward when a > 0.3 or downward when a < 0.3, with a 
maximum of 1. This procedure allows the code to cal- 
culate fewer likelihoods when the proposal distribution 
predominantly falls inside the bounds, which dramati- 
cally improves the efficiency at the start of the run. 

c. Parallelisation Although the nested sampling al- 
gorithm itself is a sequential method, we are able to 
exploit a crude parallelisation method to increase the 
number of posterior samples produced. This involves 
performing separate independent runs of the algorithm 
on different CPU cores, and then combining the results 
weighted by their respective evidence. Consider a set of 
nested sampling runs indexed by *, with each iteration in- 
dexed by j = 1 . . . ^i, where ^i is the number of iterations 
in run i before it terminates, and Zi denotes the evi- 
dence estimate from that run. Our implementation also 
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outputs the iViive live points at the time of algorithm ter- 
mination, which are indexed These last 

samples are treated separately since they are all drawn 
from the same prior volume. The runs must all be per- 
formed with identical data and models, but with different 
random seeds for the sampler. 

For each sample 6ij we calculate the posterior weight 
Wij = LijVijjZi, where logF^ = — j/fViive for the points 
up to j < and Vij = — ^i/Mive for the final points 
j > By resampling any individual chain according 
to the weights Wij we can produce a set of samples from 
the posterior. The resulting sets of posteriors for each 
i are then resampled according to the evidence Zi cal- 
culated for each chain. This ensures that chains which 
fail to converge on the global maximum will contribute 
proportionally fewer samples to the hnal posterior than 
those which do converge and produce a higher Zi esti- 
mate. The resampling processes can be performed either 
with or without replacement, where the latter is useful 
in ensuring that no samples are repeated. In this paper 
independent samples are used throughout, as repeated 
samples will distort the tests of convergence by artifi- 
cially lowering the KS test statistic. 

In practice, this procedure reduces the wall time nec- 
essary to produce a given number of posterior samples, 
as the chains can be spread over many CPU cores. 


2. MultiNest & BAMBI 

MultiNest [24-26] is a generic algorithm that imple- 
ments the nested sampling technique. It uses a model- 
based approach to generate samples within the volume X 
enclosed by the likelihood contour L{X) > Lmm- The set 
of live points is enclosed within a set of (possibly over- 
lapping) ellipsoids and a new point is then drawn uni- 
formly from the region enclosed by these ellipsoids. The 
volume of ellipsoids is used in choosing which to sam- 
ple from and points are tested to ensure that if they lie 
in multiple (N) ellipsoids they are accepted as a sample 
only the corresponding fraction of the time (1/N). The 
ellipsoidal decomposition of the live point set is chosen 
to minimize the sum of volumes of the ellipsoids. This 
method is well suited to dealing with posteriors that have 
curving degeneracies, and allows mode identification in 
multi-modal posteriors. If there are various subsets of 
the ellipsoid set that do not overlap in parameter space, 
these are identihed as distinct modes and subsequently 
evolved independently. 

MultiNest is able to take advantage of parallel com- 
puting architectures by allowing each CPU to compute 
a new proposal point. As the run progresses, the actual 
sampling efficiency (fraction of accepted samples from 
total samples proposed) will drop as the ellipsoidal ap- 
proximation is less exact and the likelihood constraint 
on the prior is harder to meet. By computing N samples 
concurrently, we can obtain speed increases of up to a 
factor of N with the largest increase coming when the 


efficiency drops below 1/7V. 

The user only needs to tune a few parameters for any 
specific implementation in addition to providing the log- 
likelihood and prior functions. These are the number 
of live points, the target efficiency, and the tolerance. 
The number of live points needs to be enough that all 
posterior modes are sampled (ideally with at least one 
live point in the initial set) and we use from 1000 to 
5000 for our analyses. The target efficiency affects how 
conservatively the ellipsoidal decomposition is made and 
a value of 0.1 (10%) was found to be sufficient; smaller 
values will produce more precise posteriors but require 
more samples. Lastly, a tolerance of 0.5 in the evidence 
calculation is sufficiently small for the run to converge to 
the correct result. 

MultiNest is implemented for LALInference within 
the Blind Accelerated Multimodal Bayesian Inference 
(BAMBI) algorithm [27]. BAMBI incorporates the 
nested sampling performed by MultiNest along with 
the machine learning of SkyNet [68] to learn the like- 
lihood function on-the-fly. Use of the machine learning 
capability requires further customisation of input settings 
and so is not used for the purposes of this study. 


C. Jump Proposals 

For both the MCMC sampler and the MCMC- 
subchains of the Nested Sampler, efficiently exploring the 
parameter space is essential to optimising performance 
of the algorithms. Gaussian jump proposals are typi- 
cally sufficient for unimodal posteriors and spaces with- 
out strong correlations between parameters, but there are 
many situations where strong parameter correlations ex- 
ist and/or multiple isolated modes appear spread across 
the multi-dimensional parameter space. When param- 
eters are strongly correlated, the ideal jumps would be 
along these correlations, which makes ID jumps in the 
model parameters very inefficient. Furthermore to sam- 
ple between isolated modes, a chain must make a large 
number of improbable jumps through regions of low prob- 
ability. To solve this problem we have used a range of 
jump proposals, some of which are specihc to the CBC 
parameter estimation problem and some of which are 
more generally applicable to multimodal or correlated 
problems. 

To ensure that an MCMC equilibrates to the target dis- 
tribution, the jump proposal densities in Eq. (22) must 
be computed correctly. Our codes achieve this using a 
“proposal cycle.” At the beginning of a sampling run, 
the proposals below are placed into an array (each pro- 
posal may be put multiple times in the array, according 
to a pre-specified weight factor). The order of the ar- 
ray is then permuted randomly before sampling begins. 
Throughout the run, we cycle through the array of pro- 
posals (maintaining the order), computing and applying 
the jump proposal density for the chosen proposal at each 
step as in Eq. (22). This procedure ensures that there is 
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only a single proposal “operating” for each MCMC step, 
simplifying the computation of the jump proposal den- 
sity, which otherwise would have to take into account the 
forward and reverse jump probabilities for all the propos- 
als simultaneously. 

Differential Evolution 

Differential evolution is a generic technique that at- 
tempts to solve the multimodal sampling problem by 
leveraging information gained previously in the run [69, 
70]. It does so by drawing two previous samples 6i and 
62 from the chain (for MCMC) or from the current set 
of live points (nested sampling), and proposing a new 
sample 6' according to: 

0' = e + q(02-0i) , (33) 

where 7 is a free coefficient. 50% of the time we use this 
as a mode- hopping proposal, with 7 = 1. In the case 
where 61 and 9 are in the same mode, this proposes a 
sample from the mode containing 02- The other 50% of 
the time we choose 7 according to 

7- A^(0,2.38/\/2IVdin,) , (34) 

where iVdim is the number of parameter space dimensions. 
The scaling of the distribution for 7 is suggested in ter 
Braak and Vrugt [70] following Roberts and Rosenthal 
[71] for a good acceptance rate with general distribu- 
tions. The differential evolution proposal in this latter 
mode proves useful when linear correlations are encoun- 
tered in the distribution, since the jump directions tend 
to lie along the principal axes of the posterior distribu- 
tion. However, this proposal can perform poorly when 
the posterior is more complicated. 

Drawing from the past history of the chain for the 
MCMC differential evolution proposal makes the chain 
evolution formally non-Markovian. However, as more 
and more points are accumulated in the past history, each 
additional point accumulated makes a smaller change to 
the overall proposal distribution. This property is suffi- 
cient to make the MCMC chain asymptotically Marko- 
vian, so the distribution of samples converges to the tar- 
get distribution; in the language of Roberts and Rosen- 
thal [72], Theorem 1, 0 in probability as n — >■ 00 

for this adaptive proposal, and therefore the posterior is 
the equilibrium distribution of this sampling. 

Eigenvector jump 

The variance-covariance matrix of a collection of repre- 
sentative points drawn from the target distribution (the 
current set of nested sampling live points) can be used 
as an automatically self-adjusting proposal distribution. 
In our implementation, we calculate the eigenvalues and 


eigenvectors of the estimated covariance matrix, and use 
these to set a scale and direction for a jump proposal. 
This type of jump results in a very good acceptance rate 
when the underlying distribution is approximately Gaus- 
sian, or is very diffuse (as in the early stages of the nested 
sampling run). In the nested sampling algorithm, the co- 
variance matrix is updated every iV;i„e/4 iterations to 
ensure the jump scales track the shrinking scale of the 
target distribution. Within each sub-chain the matrix is 
held constant to ensure detailed balance. 

Adaptive Gaussian 

We also use a 1 dimensional Gaussian jump proposal, 
where the jump for a single parameter 9^ is 9'j. = 9k + 
iV(0, cTfc). The width of the proposal is scaled to achieve 
a target acceptance rate of ^ ~ 0.234 by adjusting 

CTfc ^ CTfc -I- (3^) 

when a step is accepted, where is a scaling factor and 
A is the prior width in the fcth parameter, and adjusting 

(Tfc ^ (Tfe — (36) 

fc fc 7 100 ^ ^ 

when a step is rejected. For the MGMG, the adap- 
tation phase lasts for 100,000 samples, and = 
10 (t — — 1 during this phase; otherwise = 0. 

The nested sampling algorithm has s-y = 1. 

Gravitational-wave specific proposals 

We also use a set of jump proposals specific to the CBG 
parameter estimation problem addressed in this work. 
These proposals are designed to further improve the sam- 
pling efficiency by exploring known structures in the CBG 
posterior distribution, primarily in the sky location and 
extrinsic parameter sub-spaces. 

Sky location Determining the sky position of the CBG 
source is an important issue for followup observations of 
any detected sources. The position, parameterised by 
(a, S, dff), is determined along with the other parameters 
by the LALInference code, but it can present difficul- 
ties due to the highly structured nature of the posterior 
distribution. Although the non-uniform amplitude re- 
sponse of a single detector allows some constraint of the 
sky position of a source, the use of a network of detectors 
gives far better resolution of the posterior distribution. 
This improvement is heuristically due to the ability to 
resolve the difference in time of arrival of the signal at 
each detector, which allows triangulation of the source 
direction. The measured amplitude of the signal and 
the non-uniform prior distribution further constrain the 
posterior, but the major structure in the likelihood can 
be derived by considering the times of arrival in multi- 
ple detectors. This leads us to include two specific jump 
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proposals similar to those outlined in [23] , which preserve 
the times of arrival in two and three detector networks 
respectively. 

Sky Reflection In the case of a three-detector network, 
the degeneracy of the ring based on timing is bro- 
ken by the presence of a third detector. In this case, 
there are two solutions to the triangulation prob- 
lem which correspond to the true source location, 
and its reflection in the plane containing the three 
detector sites. If the normal vector to this plane 
is h, the transition (in Cartesian coordinates with 
origin at the geocentre) between the true point x 
and its reflection x' is written 

x' = X — 2fi\n.[x — Xi)\ (37) 

where Xi is the unit vector pointing in the direction 
of one of the detector sites. The resulting point is 
then projected back onto the unit sphere parame- 
terised by a, (5. To ensure detailed balance, the re- 
sulting point is perturbed by a small random vector 
drawn from a 3D Gaussian in (t, a, 6) The time pa- 
rameter is updated in the same way as for the sky 
rotation proposal above. As in the two-detector 
case, the degeneracy between these points can be 
broken by consideration of the signal amplitudes 
observed in the detector, however this is not al- 
ways the case as the secondary mode can have a 
similar likelihood. 

Extrinsic parameter proposals 

Extrinsic Parameter proposal There exist a correla- 
tion between the inclination, distance, polarization 
and the sky location dues to the sensitivity of the 
antenna beam patterns of the detectors. This corre- 
lation makes the two solutions for the sky location 
from the thee-detector network (described above) 
correspond to different values of inclination, dis- 
tance and polarization. We solve analytically the 
values of those parameters when trying to jump 
between the two sky reflections. The equations are 
detailed in [73]. 

Polarization and Phase correlation There exists a 
degeneracy between the 4> and '0 parameters when 
the orbital plane is oriented perpendicular to the 
line of signal, i.e. t = {0,7r}. In general these 
parameters tend to be correlated along the axes 
a = ijj + 4> s-nd P = iIj ~ We propose jumps 
which choose a random value of either the a oi P 
parameter (keeping the other constant) to improve 
the sampling of this correlation. 

Miscellaneous proposals 

Draw from Prior A proposal that generates samples 
from the prior distribution (see section II C) by re- 
jection sampling. This is mostly useful for improv- 
ing the mixing of high-temperature MCMC chains, 
as it does not depend on the previous iteration. 


Phase reversal Proposes a change in the orbital phase 
parameter 4>j+i = + tt) (mod 27 t), which will 

keep the even harmonics of the signal unchanged, 
but will flip the sign of the odd harmonics. Since 
the even harmonic / = m = 2 dominates the signal, 
this is useful for proposing jumps between multi- 
ple modes which differ only by the relatively small 
differences in the waveform generated by the odd 
harmonics. 

Phase and polarization reversal Proposes a simulta- 
neous change of the orbital phase and polarisa- 
tion parameters 4>j+i = [4>j + ’’’) (mod 27 t) and 
= (0j -I- 7t/2) (mod tt). 

Gibbs Sampling of Distance The conditional likeli- 
hood of the distance parameter follows a known 
form, which allows us to generate proposals from 
this distribution independently of the previous iter- 
ation, reducing the correlation in the chains. As the 
signal amplitude scales proportionally to = u, 
the logarithm of the likelihood function (Equation 
(10)), constrained to only distance variations, is 
quadratic in u, 

log Liu) = A + Bu + Cu^, (38) 

which in our case yields a Gaussian distribution 
with mean /i = —B/2C and variance cr^ = 1/2C. 
By calculating the value of log L at three different 
distances, the quadratic coefficients are found and 
a new proposed distance can be generated from the 
resulting Gaussian distribution. 

IV. POST-PROCESSING 

The main data products of all the above algorithms 
are a set of ‘samples’ assumed to be drawn independently 
from the posterior probability distribution p[0\d^I) (as 
defined in Equation (1)) and, for the nested sampling al- 
gorithms, an approximation to the evidence Z = P(d\I) 
(for MGMG, evidence computation is performed in post- 
processing, see section IV G). Each algorithm initially 
produces outputs which are different in both their form 
and relation to these quantities. A suite of Python scripts 
has been specifically developed for the purpose of con- 
verting these outputs to a common results format in or- 
der to facilitate comparisons between the algorithms and 
promote consistency in the interpretation of results. At 
the time of writing these scripts (and associated libraries) 
can be found in the open-source LALsuite package [47]. 
The end result of this process is a set of web-ready HTML 
pages containing the key meta-data and statistics from 
the analyses and from which it should be possible to re- 
produce any results produced by the codes. In this sec- 
tion we outline in more detail the steps needed to convert 
or post-process the output of the different algorithms to 
this common results format and important issues related 


15 


to interpreting these results and drawing scientific con- 
clusions. 


A. MCMC 

The MCMC algorithm in LALInference produces a 
sequence of 0(10®) - 0(10®) samples, depending on the 
number of source parameters in the model, the number 
of interferometers used, and the bandwidth of the signal. 
Each sample consists of a set of source parameters {6} 
and associated values of the likelihood function L{d\9) 
and prior p{9). We cannot immediately take this output 
sequence to be our posterior samples as we cannot assume 
that all the samples were drawn independently from the 
actual posterior distribution. 

In order to generate a set of independent posterior sam- 
ples the post-processing for the MCMC algorithm first 
removes a number of samples at the beginning of the 
chain - the so-called ‘burn-in’ - where the MCMC will 
not yet be sampling from the posterior probability den- 
sity function. For a d-dimensional parameter space, the 
distribution of the log-likelihood is expected to be close 
to £max — X, where £max is the maximum achievable 
log-likelihood, and A is a random variable following a 
Gamma(d/2, 1) distribution [74]. Thus, we consider the 
burn-in to end when a chain samples log-likelihood val- 
ues that are within d/2 of the highest log- likelihood value 
found by the chain. Once we have discarded these sam- 
ples, the set of remaining samples is then ‘down-sampled’; 
the chain is re-sampled randomly at intervals inversely 
proportional to the the autocorrelation length to produce 
a sub-set of samples which are assumed to be drawn in- 
dependently from the posterior distribution. See section 
III A above for more details. 


B. Nested sampling 

The output of both of the nested sampling algorithms 
in LALInference are a list (or lists in the case of parallel 
runs) of the live points sampled from the prior distribu- 
tion for a particular model and data set and consisting 
of a set of parameters and their associated \og{Lij) and 
Zij. These live points approximately lie on the contours 
enclosing the nested prior volumes and each has associ- 
ated with it some fraction of the evidence assumed to be 
enclosed within said contour. The post-processing step 
takes this information and uses it to generate posterior 
samples from the list of retained live points using Eq. 30 
for single runs, along with the procedure described in 
section IIIB 1 c for parallel runs. 

C. Evidence calculation using MCMC outputs 

Whilst the nested sampling algorithms in 
LALInference directly produce an approximation 


to the value of the evidence Z (and produce posterior 
samples as a by-product), we can also use the output 
from the MCMC algorithms to calculate independent 
estimates of Z in post-processing. We have tested 
several methods of computing the evidence from pos- 
terior samples, including the harmonic mean [75-77], 
direct integration of an estimate of the posterior density 
[78], and thermodynamic integration (see e.g. [79, 80]). 
We have found that only thermodynamic integration 
permits reliable estimation of the evidence for the typical 
number and distribution of posterior samples we obtain 
in our analyses. 

Thermodynamic integration considers the evidence as 
a function of the temperature, Z(/3|i7), defined as 

Z{(3\H) = J d9p{d\H,9,/3)p{9\H) 

= J d9p{d\H,9fp{9\H) (39) 

where j3 = 1/T is the inverse temperature of the chain. 
Differentiating with respect to /3, we find 

^lnZ(/3|i7) = (lnp(d|iJ,0))^ (40) 

where (\np{d\H^9)) p is the expectation value of the log 
likelihood for the chain with temperature 1//3. We can 
now integrate (40) to hnd the logarithm of the evidence 

lnZ= / d/3 {\r,p{d\H,9))p. (41) 

It is straightforward to compute (lnp(d|i7, 0))^ for each 
chain in a parallel-tempered analysis; the integral in Eq. 
(41) can then be estimated using a quadrature rule. Be- 
cause our typical temperature spacings are coarse, the 
uncertainty in this estimate of the evidence is typically 
dominated by discretisation error in the quadrature. We 
estimate that error by performing the quadrature twice, 
once using all the temperatures in the chain and once 
using half the temperatures. To achieve very accurate 
estimates of the evidence, sometimes ^ 20 to ~ 30 tem- 
peratures are needed, out to a maximum of j3~^ ^ 10®, 
which adds a significant cost over the computations nec- 
essary for parameter estimation; however, reasonably ac- 
curate estimates of the evidence can nearly always be 
obtained from a standard run setup with ^ 10 chains. 
Figure 5 plots the integrand of Eq. (41) for the synthetic 
GW signals analysed in § V B, illustrating both the coarse 
temperature spacing of the runs and the convergence of 
the evidence integral at high temperature. 

D. Generation of statistics and marginal posterior 
distributions 

Whilst the list of posterior samples contains all the 
information about the distribution of the source param- 
eters obtained from the analysis, we need to make this 
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FIG. 5: The integrand of the evidence integral (Eq. (41)) 

versus P for the analyses of synthetic GW signals in § VB. 
The evidence is given by the area nnder each curve. Table I 
gives the results of the integration together with the estimated 
error in the quadrature, following the procedure described in 
§ IV G. The jaggedness of the curves illustrates that the tem- 
perature spacing required for convergent MCMC simulations 
is larger than that required for accurate quadrature to com- 
pute the evidence; the flatness at small P illustrates that, for 
these simulations, the high-temperature limit is sufficient for 
convergence of the evidence integral. 

more intelligible by summarising it in an approximate 
way. We have developed a number of different sum- 
mary statistics which provide digested information about 
the posterior distributions, which are applied in post- 
processing to the output samples. 

The simplest of these are simply the mean and stan- 
dard deviation of the one-dimensional marginal distribu- 
tions for each of the parameters. These are estimated 
as the sample mean, standard deviation, etc., over the 
samples, which converge on their continuous distribution 
equivalents (3) in the limit of large numbers of samples. 
These are particularly useful for giving simple measures 
of the compatibility of the results with the true values, if 
analysing a known injection. 

However, estimators are not always representative of 
the much larger amount of information contained in the 
marginal posterior distributions on each of the parame- 
ters (or combinations of them). For summarising one- or 
two-dimensional results we create plots of the marginal 
posterior probability density function by binning the 
samples in the space of the parameters and normalising 
the resulting histogram by the number of samples. 

We are also interested in obtaining estimates of the 
precision of the resulting inferences, especially when com- 


paring results from a large number of simulations to ob- 
tain an expectation of parameter estimation performance 
under various circumstances. We quantify the precision 
in terms of ‘credible intervals’, defined for a desired level 
of credibility (e.g. Pcred = 95% probability that the pa- 
rameter lies within the interval), with the relation 

credible level = / p{9\d)d9. (42) 

credible interval 

The support of the integral above is the credible in- 
terval, however this is not defined uniquely by this ex- 
pression. In one dimension, we can easily find a region 
enclosing a fraction x of the probability by sorting the 
samples by their parameter values and choosing the range 
from [N{1 — x) /2, N{1 + x) /2] where N is the number of 
independent samples in the posterior distribution. The 
statistical error on the fraction x of the true distribu- 
tion enclosed, caused by the approximation with discrete 
samples is « ^Jx{\ — x)/N. To achieve a 1% error in the 
90% region we therefore require 900 independent sam- 
ples. Typically we collect a few thousand samples, giving 
an error < 1% on the credible interval. 

We are also interested in the minimum credible inter- 
val, which is the smallest such region that encloses the 
desired fraction of the posterior. In the case of a uni- 
modal one-dimensional posterior this leads to the highest 
posterior density interval. 

To find estimates of the minimum credible intervals we 
use a number of techniques that have different regimes of 
usefulness, depending primarily on the number of sam- 
ples output from the code and the number of parameters 
we are interested in analysing conjointly. 

When we are considering the one-dimensional marginal 
posterior distributions, we simply compute a histogram 
for the parameter of interest using equally-sized bins. 
This directly tells us the probability associated with that 
region of the parameter space: the probability density 
is approximately equal to the fraction of samples in the 
bin divided by the bin width. This simple histogram 
method involves an appropriate choice of the bin size. 
We must be careful to choose a bin size small enough 
that we have good resolution and can approximate the 
density as piecewise constant within each bin, but large 
enough so that the sampling error within each bin does 
not overwhelm the actual variations in probability be- 
tween bins. 

To recover the minimum credible interval we apply a 
greedy algorithm to the histogram bins. This orders the 
bins by probability, and starting from the highest prob- 
ability bin, works its way down the list of bins until the 
required total probability has been reached. Although 
this procedure generally yields satisfactory results, it is 
subject to bias due to the discrete number of samples per 
bin. To see this, consider a uniform probability distribu- 
tion that has been discretely sampled. The statistical 
variation of the number of samples within bins will cause 
those where the number fluctuates upward to be chosen 
before those where it fluctuates downward. The credi- 
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Distribution 

Nested 

Analytic Sampling 

BAMBI 

MCMC 

thermo. 

Unimodal 

-21.9 

-21.8 ±0.1 

-21.8 ±0.12 

-20.3 ± 1.9 

Bimodal 

-30.0 

-30.0 ±0.1 

-29.9 ±0.14 

-26.7 ±3.0 

Rosenbrock 

- 

-70.9 ±0.2 

-69.1 ± 0.2 

-63.0 ±7.6 

BNS 

- 

68.7 ±0.34 

69.98 ±0.17 

68.2 ± 1.1 

NSBH 

- 

62.2 ±0.27 

63.67 ±0.16 

63.40 ±0.72 

BBH 

- 

71.4 ±0.18 

72.87 ±0.15 

72.44 ±0.11 


TABLE I: The log evidence estimates for the analytic likeli- 
hood distributions (§V A) and the simulated signals (§ V B) 
calculated with the three methods, with estimated uncer- 
tainty. For the thermodynamic integration method we used 16 
steps on the temperature ladder, except for the Rosenbrock 
likelihood which required 64. For distributions that permit 
an analytic computation of evidence, the samplers produce 
evidence estimates consistent with the true value. For the 
others, the estimates produced by the samplers are not con- 
sistent, indicating that there remains some systematic error in 
the evidence calculation methods for the more difficult prob- 
lems. 


ble interval estimated by this method will therefore be 
smaller than the true interval containing the desired pro- 
portion of the probability. In [81] we investigate several 
methods of overcoming this problem. 


V. VALIDATION OF RESULTS 

To confirm the correctness of the sampling algorithms, 
we performed cross-comparisons of recovered posterior 
distributions for a variety of known distributions and ex- 
ample signals. The simplest check we performed was 
recovery of the prior distribution, described in section 
II C. The one-dimensional distributions output by the 
codes were compared using a Kolmogorov-Smirnov test, 
where the comparisons between the three codes on the 
15 marginal distributions were all in agreement with p- 
values above 0.02. We next analysed several known like- 
lihood functions, where we could perform cross-checks 
between the samplers. These were a unimodal 15- 
dimensional correlated Gaussian, a bimodal correlated 
Gaussian distribution, and the Rosenbrock banana func- 
tion. For the unimodal and bimodal distributions we can 
also compare the results of the samplers to the analytical 
marginal distributions to confirm they are being sampled 
correctly. 


A. Analytic likelihoods 

The multivariate Gaussian distribution was specified 
by the function 

logLMV = - - Sj)- (43) 


where Cij is a covariance matrix of dimension 15, and 
the mean values 9i are chosen to lie within the usual 
ranges, and have the usual scales, as in the GW case. 
Cij was chosen so that its eigenvectors do not lie parallel 
to the axes defined by the parameters 9i, and the ratio 
of the longest to shortest axis was ~ 200. The evidence 
integral of this distribution can be computed to good 
approximation over a prior domain bounded at 5a using 
the determinant of the covariance matrix and the prior 
volume V, Zmv = det 

The bimodal distribution was composed of two copies 
of the unimodal multivariate Gaussian used above, with 
two mean vectors 9i and \i separated by 8cr, as defined 
by Cij. Using a bounding box at ±9 <t about the mid- 
point of the two modes, the evidence is calculated as 

7 ' ~ „- 30.02 

The Rosenbrock “banana” function is a commonly 
used test function for optimisation algorithms [82]. 
For this distribution, we do not have analytic one- 
dimensional marginal distributions to compare to, or 
known evidence values, so we were only able to do cross- 
comparisons between the samplers. 

Each sampler was run targeting these known distri- 
butions, and the recovered posterior distributions and 
evidences were compared. The posterior distributions 
agreed for all parameter as expected, and an example of 
one parameter is shown in figure 6. 

The recovered evidence values are shown in table I. 
For the MCMC sampler the quoted errors come from the 
thermodynamic integration quadrature error estimates 
described in §IVC; for the nested samplers the quoted 
errors are estimated by running the algorithm multiple 
times and computing the standard deviation of the re- 
sults. For the simplest unimodal and bimodal distribu- 
tions we see excellent agreement between the sampling 
methods, which agree within the Icr statistical error esti- 
mates. The more difficult Rosenbrock likelihood results 
in a statistically significant disagreement between the 
nested sampling and BAMBI algorithms, with BAMBI 
returning the higher evidence estimate. To highlight the 
difficulty, for this problem the thermodynamic integra- 
tion methods used with MGMG required 64 temperature 
ladder steps to reach convergence to /3(logL) = 0 at high 
temperatures, as opposed to the 16 used in the other 
problems. This pattern is repeated in the evidence for 
the signals, where there is a difference of several stan- 
dard deviations between the methods. 


B. Simulated GW signals 

As an end-to-end test, we ran all three sampling 
flavours of LALInference {MCMC, section III A; Nest, 
section IIIB 1 and BAMBI, section IIIB 2) on three test 
signals, described in table II. These signals were injected 
into coloured Gaussian noise of known power spectrum 
and recovered with the same approximant used in gen- 
erating the injection, listed in table II. Since we used 
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FIG. 6: Example comparing cumulative distributions for the analytic likelihood functions for each sampler for the (arbitrary) 
mi parameter for the three test likelihood functions. The samplers are shown as Nest:purple left hatches, MCMC: green 
horizontal hatches BAMBI: blue right hatches, with the true cumulative distributions shown in red where available, (left) uni- 
modal multivariate Gaussian distribution (middle) bimodal distribution (right) Rosenbrock distribution. The different methods 
show good agreement with each other, and with the known analytic distributions. Vertical dashed lines indicate the 5%-95% 
credibility interval for each method. 


inspiral-only waveforms models for both injection and 
recovery, there is a sharp cutoff in the signal above the 
waveform’s termination frequency. It has been shown 
that in some circumstances the presence of this cut- 
off provides an artificially sharp feature which can im- 
prove parameter estimation beyond that of a realistic sig- 
nal [83]. Nonetheless, since the focus of this study is the 
consistency of the algorithms, we can proceed to use the 
sharply terminating waveforms for comparison purposes. 

Figures 7, 8 and 9 show two-dimensional 90% credible 
intervals obtained by all three samplers on various com- 
binations of parameters. Figure 7 (see table II) shows 
the typical posterior structure for a BNS system. We 
show only three two-dimensional slices through the nine- 
dimensional (non-spinning) parameter space, highlight- 
ing the most relevant parameters for an astrophysical 
analysis. Selected one-dimensional 90% credible inter- 
vals are shown in table III. This is the least challenging of 
the three example signals, since we restrict the model to 
non-spinning signals only. The posterior PDFs show ex- 
cellent agreement between the sampling methods. In the 
leftmost panel we show the recovered distribution of the 
masses, parametrised by the chirp mass and symmetric 
mass ratio. This shows the high accuracy to which the 
chirp mass can be recovered compared to the mass ra- 
tio, which leads to a high degree of correlation between 
the estimated component masses. The domain of the 
prior ends at a maximum of 77 = 0.25, which corresponds 
to the equal mass conhguration. In the central panel 
we show the estimated sky location, which is well deter- 
mined here thanks to the use of a three-detector network. 
In the rightmost panel, the correlation between the dis- 
tance and inclination angle is visible, as both of these 
parameter scale the effective amplitude of the waveform. 
The reflection about the = 7t/2 line shows the de- 
generacy which is sampled efficiently using the extrinsic 
parameter jump proposals III C. 


Similarly to Figure 7, Figure 8 (see table II) shows the 
posterior for a NSBH system. This signal was recovered 
using a spin-aligned waveform model, and we show six 
two-dimensional slices of this eleven-dimensional param- 
eter space. Selected one-dimensional 90% credible inter- 
vals are shown in table IV. The top-left panel shows the 
M. — Tj distribution; in comparison to Figure 7 the mass 
ratio is poorly determined. This is caused by the cor- 
relation between the 77 parameter and the aligned spin 
magnitudes, which gives the model greater freedom in 
fitting 77 , varying oi and 02 to compensate. This cor- 
relation is visible in the bottom-right panel. The other 
panels on the bottom row illustrate other correlations 
between the intrinsic parameters. The top-right panel 
shows the correlation between distance and inclination, 
where in this case the spins help break the degeneracy 
about the Oj^ = tt/2 line. 

Lastly, figure 9 (see table II) shows the posterior for 
a BBH system, recovered taking into account precession 
effect from two independent spins. We show nine two- 
dimensional slices of this fifteen-dimensional parameter 
space. One-dimensional 90% credible intervals are shown 
in table V. In addition to the features similar to figure 7 
in the top row, correlations with spin magnitudes (middle 
row) and tilt angles (bottom row) are shown. Note that 
the injected spin on the first component is almost anti- 
aligned with the orbital angular momentum, such that 
the tilt angle = 3.1, an unlikely random choice. This 
angle has a low prior probability, and as a result the in- 
jected value lies in the tails of the posterior distribution. 
This has repercussions in the recovered distributions for 
the spin magnitude and mass ratio, since they are par- 
tially degenerate in their effect on the phase evolution of 
the waveform, which results in the true value also being 
located in the tails of these distributions. 

In all three cases, the three independent sampling al- 
gorithms converge on the same posterior distributions, 
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Fig. 

Name 

Approximant 

mi 

(Mo) 

m2 

(Mo) 

ffli 

fl2 

(Rad) 

^2 

(Rad) 

i 

(Rad) 

distance 

(Mpc) 

Network SNR 

7 

BNS 

TaylorF2 3.5PN 

1.3382 

1.249 

0 

0 

- 

- 

2.03 

135 

13 

8 

NSBH 

SpinTaylorTI 3.5PN 

15 

1.35 

0.63 

0 

0 

- 

1.02 

397 

14 

9 

BBH 

SpinTaylorT4 3.5PN 

15 

8 

0.79 

0.8 

3.1 

1.44 

2.307 

500 

15 


TABLE II: Details of the injected signals used in section V B, showing the waveform approximant used with the masses 
spin magnitudes and tilt angles (u{i, 2 } , f{i, 2 })j and the distance and inclination (t). 


indicating that the algorithms can reliably determine the 
source parameters, even for the full 15-dimensional spin- 
ning case. 

We also computed the evidence for each signal, rela- 
tive to the Gaussian noise hypothesis, using each sampler, 
with errors computed as in § V A. The results in table I 
show that the two flavours of nested sampling produce 
more precise estimates, according to their own statisti- 
cal error estimates, but they disagree in the mean value. 
The thermodynamic integration method used with the 
MCMC algorithm (with 16 steps on the temperature lad- 
der), produces a larger statistical error estimate, which 
generally encloses both the nested sampling and BAMBI 
estimates. These results indicate that there remains some 
systematic disagreement between the different methods 
of estimating evidence values, despite the good agree- 
ment between the posteriors. The BAMBI method gen- 
erally produces a higher evidence estimate compared to 
the nested sampling approach, by around a factor of e. 
This indicates that further improvement is necessary be- 
fore we can rely on these methods to distinguish models 
which are separated by evidence values lower than this 
factor. 


C. Confidence intervals 

Having checked the agreement of the posterior distri- 
butions on three selected injections, we performed a fur- 
ther check to ensure that the probability distributions 
we recover are truly representative of the confidence we 
should hold in the parameters of the signal. In the ideal 
case that our noise and waveform model matches the 
signal and noise in the data, and our prior distribution 
matches the set of signals in the simulations, then the 
recovered credible regions should match the probability 
of hnding the true signal parameters within that region. 
By setting up a large set of test signals in simulated noise 
we can see if this is statistically true by determining the 
frequency with which the true parameters lie within a 
certain confidence level. This allows us to check that our 
credible intervals are well calibrated, in the sense of [84]. 

For each run we calculate credible intervals from the 
posterior samples, for each parameter. We can then ex- 
amine the number of times the injected value falls within 
a given credible interval. If the posterior samples are an 
unbiased estimate of the true probability, then 10% of the 
runs should find the injected values within a 10% credi- 


ble interval, 50% of runs within the 50% interval, and so 
on. 

We perform a KS-test on whether the results match the 
expected 1 to 1 relation between the fraction of signals in 
each credible region, and the level associated with that 
region. 

For 1 dimensional tests our credible regions are defined 
as the connected region from the lowest parameter value 
to the value where the integrated probability reaches the 
required value. In practice we order the samples by pa- 
rameter value and query what fraction of this list we 
count before passing the signal value. 

To perform this test, we drew 100 samples from the 
prior distribution of section II C, providing a set of injec- 
tions to use for the test. This was performed using the 
TaylorF2 waveform approximant for both injection and 
recovery, with simulated Gaussian data using the initial 
LIGO and Virgo noise curves and 3 detector sites. 

We calculated the cumulative distribution of the num- 
ber of times the true value for each parameter was found 
within a given credible interval p, as a function of p, 
and compared the result to a perfect 1 — 1 distribution 
using a KS test. All three codes passed this test for 
all parameters, indicating that our sampling and post- 
processing does indeed produce well-calibrated credible 
intervals. Figure 10 shows an example of the cumula- 
tive distribution of p- values produced by this test for the 
distance parameter. Similar plots were obtained for the 
other parameters. 

VI. COMPUTATIONAL PERFORMANCE 

We have benchmarked the three samplers using the 
three GW events described in section VB. Although the 
specific performances listed are representative only of 
these signals, they do provide a rough idea of the rel- 
ative computational performance of the sampling meth- 
ods and the relative difficulty in the BNS, NSBH and 
BBH analyses, when running in a typical configuration. 
The computational cost of a parameter estimation run is 
strongly dependent on two main factors: the waveform 
family used (see sec. II B) and the structure of the param- 
eter space. Prohling of the codes show that computation 
of waveforms is the dominating factor, as the calculation 
of the phase evolution at each frequency bin is relatively 
expensive compared to the computation of the likelihood 
once the template is known. 
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FIG. 7: Comparison of probability density functions for the BNS signal (table II) as determined by each sampler. Shown are 
selected 2D posterior density functions in greyscale, with red cross-hairs indicating the true parameter values, and contours 
indicating the 90% credible region as estimated by each sampler. On the axes are superimposed the one-dimensional marginal 
distributions for each parameter, as estimated by each sampler, and the true value indicated by a vertical red line. The colours 
correspond to blue: Bambi, magenta: Nest, green: MCMC. (left) The mass posterior distribution parametrized by chirp mass 
and symmetric mass ratio, (centre) The location of the source on the sky. (right) The distance dr and inclination Ojn of the 
source showing the characteristic V-shaped degeneracy. 



FIG. 8: Gomparison of probability density functions for the NSBH signal (table II), with same color scheme as fig 7. (first row 
left) The mass posterior distribution parametrized by chirp mass and symmetric mass ratio, (first row centre) The location of 
the source on the sky. (first row right) The distance di, and inclination 9 jn of the source. In this case the V-shaped degeneracy 
is broken, but the large correlation between d_r and 6 jn remains, (second row left) The spin magnitudes posterior distribution, 
(second row centre) The spin and mass of the most massive member of the binary illustrating the degeneracy between mass 
and spin, (second row right) The spin and symmetric mass ratio. 
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TABLE III: BNS recovered parameters. Median values and 5% — 95% credible interval for a selection of parameters for each 
of the sampling algorithms. 
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TABLE IV: NSBH recovered parameters, defined as above. 
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TABLE V: BBH recovered parameters, defined as above. 


The computationally easiest waveform to generate is 
TaylorF2, where an analytic expression for the waveform 
in the frequency domain is available. For the BNS signal 
simulated here, around 50 waveforms can be generated 
per second at our chosen configuration (32 s of data sam- 
pled at 4096 Hz). On the other hand, more sophisticated 
waveforms, like SpinTaylorT4 with precessing spins, re- 
quire solving differential equations in the time domain, 
and a subsequent FFT (the likelihood is always calcu- 
lated in the frequency domain), which raises the CPU 
time required to generate a single waveform by an order 
of magnitude. 

The structure of the parameter space affects the length 
of a run in several ways. The hrst, and most obvious, is 
through the number of dimensions: when waveforms with 
precessing spins are considered a 15-dimension parame- 
ter space must be explored, while in the simpler case of 
non-spinning signals the number of dimensions is 9. The 
duration of a run will also depend on the correlations 
present in the parameter space, e.g. between the distance 
and inclination parameters [38] . Generally speaking runs 
where correlations are stronger will take longer to com- 
plete as the codes will need more template calculations 
to effectively sample the parameter space and find the 
region of maximum likelihood. 

Table VI shows a comparison of the efficiency of each 
code running on each of the simulated signals in terms of 
the cost in CPU time, wall time, and the CPU/wall time 
taken to generate each sample which ended up in the 
posterior distribution. These numbers were computed 
using the same hardware, Intel Xeon E5-2670 2.6 GHz 
processors. 


BNS 

Bambi 

Nest 

MCMC 

posterior samples 

6890 

19879 

8363 

CPU time (s.) 

3317486 

1532692 

725367 

wall time (s.) 

219549 

338175 

23927 

CPU seconds/sample 

481.5 

77.1 

86.7 

wall seconds/sample 

31.9 

17.0 

2.9 

NSBH 

Bambi 

Nest 

MCMC 

posterior samples 

7847 

20344 

10049 

CPU time (s.) 

2823097 9463805 

4854653 

wall time (s.) 

178432 

2018936 

171992 

CPU seconds/sample 

359.8 

465.2 

483.1 

wall seconds/sample 

22.7 

99.2 

17.1 

BBH 

Bambi 

Nest 

MCMC 

posterior samples 

10920 

34397 

10115 

CPU time (s.) 

2518763 

7216335 

5436715 

wall time (s.) 

158681 

1740435 

200452 

CPU seconds/sample 

230.7 

209.8 

537.5 

wall seconds/sample 

14.5 

50.6 

19.8 


TABLE VI: Preformance of all three sampling methods on 
the three signals from table II. The time quoted in the “CPU 
time” line is the cumulative CPU-time across multiple cores, 
while the time quoted in the “wall time” line is the actual 
time taken to complete the sampling. The difference is an in- 
dication of the varying degrees of parallelism in the methods. 


We note that at the time of writing the three samplers 
have different level of parallelization, which explains the 
differences between codes of the ratio CPU time to wall 
time. 
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FIG. 9: Comparison of probability density functions for the BBH signal (table II), with same color scheme as fig 7. (first row 
left) The mass posterior distribution parametrized by chirp mass and symmetric mass ratio, (first row centre) The location of 
the source on the sky. (first row right) The distance dz, and inclination 6 jn of the source showing the degeneracy is broken, 
as in the NSBH case, (second row left) The spins magnitude posterior distribution, (second row centre) The spin and mass of 
the most massive member of the binary illustrating the degeneracy between mass and spin, (second row right) The spin and 
symmetric mass ratio, (third row left) The spins tilt posterior distribution, (third row centre) The spin tilt of the more massive 
member of the binary and the symmetric mass ratio, (third row right) The spin tilt and mass of the most massive member of 
the binary. 


VII. CONCLUSIONS AND FUTURE GOALS 

In this paper we have described the application of three 
stochastic sampling algorithms to the problem of com- 
pact binary parameter estimation and model selection. 


Their implementation in the LALInference package pro- 
vides a flexible and open-source toolkit which builds upon 
much previous work to give reliable results [13-17, 17- 
21, 23, 25-27, 29, 30]. The independent sampling meth- 
ods have allowed us to perform detailed cross-validation 
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p 


FIG. 10: P vs P plot for the distance parameter. On the 

X axis is the probability p contained in a credible interval, 
and on the y axis the fraction of true values which lay inside 
that interval. The diagonal line indicates the ideal distribu- 
tion where credible intervals perfectly reflect the frequency of 
recovered injections. For all three sampling algorithms the 
results are statistically consistent with the diagonal line, with 
the lowest KS statistic being 0.25. 

of the results of inference on a range of GW signals from 
compact binary coalescences, such as will be observed by 
future gravitational-wave detectors. We have also per- 
formed internal consistency checks of the recovered pos- 
terior distributions to ensure that the quoted credible 
intervals truly represent unbiased estimates of the pa- 
rameters under valid prior assumptions. 

The release of the LALInf erence toolkit as part of the 
open-source LAL package, available from [47] , has already 
provided a base for developing methods for testing gen- 
eral relativity [35-37] and performing parameter estima- 
tion on a variety of other GW sources[40, 41]. In the fu- 
ture we intend to further develop the implementation to 
accommodate more sophisticated noise models for data 
analysis in the advanced detector era. This will enable us 
to provide parameter estimation results which are robust 
against the presence of glitches in the data, against time- 
dependent fluctuations in the noise spectrum[42, 43, 45], 
and will allow us to incorporate uncertainty in the cali- 
bration of the instruments. 

Work is also ongoing in improving inference to incor- 
porate systematic uncertainties in the waveform models 
which affect estimates of intrinsic parameters [55]. 

Meanwhile, recent advances in reduced order modelling 
of the waveforms and developments of surrogate mod- 
els for the most expensive waveforms should result in a 


dramatic improvement in the speed of parameter esti- 
mation [85? -87]. More intelligent proposal distribu- 
tions also have the potential to reduce the autocorrela- 
tion timescales in the MGMG and Nested Sampling algo- 
rithms, further improving the efficiency of these methods. 

The work described here should serve as a foundation 
for these further developments, which will be necessary 
to fully exploit the science capabilities of the advanced 
generation of gravitational-wave detectors, and produce 
parameter estimates in a timely manner. 
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